Record Linkage: Making the Most Out of Errors in Linking Variables
نویسندگان
چکیده
This paper presents a refinement of the probabilistic medical record linking algorithm. We introduced "close agreement" to account for typical errors in administrative variables used for record linkage. Linking data on early pregnancy determinants with data on late child outcomes was used as a case study. We analyzed whether the addition of close agreement resulted in a higher discriminating power of the linking key reflected ina reduction of the number of links with an uncertain linking status. Incorporating close agreement for postal code and date of birth in the record linking algorithm resulted in a reduction of 95% of the number of pairs in the uncertain region. We showed that the extension of a third outcome"close" when comparing values of corresponding linking variables led to a major improvement in our probabilistic record linkage study. Similar improvements are likely in other studies because the frequency, nature, and type of errors in other large databases will not be substantially different.
منابع مشابه
Cryptanalysis of Basic Bloom Filters Used for Privacy Preserving Record Linkage
Linking databases containing information on individual characteristics and behavior is of increasing scientific and commercial interest. In many applications, linking databases has to be done without a unique personal number. Hence, due to privacy concerns, privacy preserving record linkage (PPRL) is used most often. In this context encrypted personal quasi-identifiers such as first names, surn...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملPrivacy Preserving Probabilistic Record Linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality
BACKGROUND Record linkage of existing individual health care data is an efficient way to answer important epidemiological research questions. Reuse of individual health-related data faces several problems: Either a unique personal identifier, like social security number, is not available or non-unique person identifiable information, like names, are privacy protected and cannot be accessed. A s...
متن کاملThe urge to merge: linking vital statistics records and Medicaid claims.
This paper describes a procedure used to link Medicaid claims data to California vital statistics records for very low birthweight infants. The linkage involved about 53,000 infants born from 1980 to 1987 and 1.46 million claims for delivery/birth-related hospital admissions during the same period. Because the two data files did not share a unique identifier, record linkage required combining e...
متن کاملError Estimation in Linking Heterogeneous Data Sources Error Estimation in Linking Heterogeneous Data Sources
BACKGROUND Record linkage is the process of bringing together related records that have been compiled separately [1]. Many types of studies have been conducted that have used different methods and approaches to link medical records obtained from heterogeneous data sources. For example, the use of administrative data for research purposes has led to considerable interest in computerized methods ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- AMIA ... Annual Symposium proceedings. AMIA Symposium
دوره شماره
صفحات -
تاریخ انتشار 2006